Unsupervised lattice-based acoustic model adaptation for speaker-dependent conversational telephone speech transcription
نویسندگان
چکیده
This paper examines the application of lattice adaptation techniques to speaker-dependent models for the purpose of conversational telephone speech transcription. Given sufficient training data per speaker, it is feasible to build adapted speakerdependent models using lattice MLLR and lattice MAP. Experiments on iterative and cascaded adaptation are presented. Additionally various strategies for thresholding frame posteriors are investigated, and it is shown that accumulating statistics from the local best-confidence path is sufficient to achieve optimal adaptation. Overall, an iterative cascaded lattice system was able to reduce WER by 7.0% abs., which was a 0.8% abs. gain over transcript-based adaptation. Lattice adaptation reduced the unsupervised/supervised adaptation gap from 2.5% to 1.7%.
منابع مشابه
Conversational telephone speech recognition
This paper describes the development of a speech recognition system for the processing of telephone conversations, starting with a state-of-the-art broadcast news transcription system. We identify major changes and improvements in acoustic and language modeling, as well as decoding, which are required to achieve state-of-theart performance on conversational speech. Some major changes on the aco...
متن کاملTwo-pass decision tree construction for unsupervised adaptation of HMM-based synthesis models
Hidden Markov model (HMM) -based speech synthesis systems possess several advantages over concatenative synthesis systems. One such advantage is the relative ease with which HMM-based systems are adapted to speakers not present in the training dataset. Speaker adaptation methods used in the field of HMM-based automatic speech recognition (ASR) are adopted for this task. In the case of unsupervi...
متن کاملAutomatically learning speaker-independent acoustic subword units
We investigate methods for unsupervised learning of sub-word acoustic units of a language directly from speech. We demonstrate that states of a hidden Markov model “grown” using a novel modification of the maximum likelihood successive state splitting algorithm correspond very well with the phones of the language. In particular, the correspondence between the Viterbi state sequence for unseen s...
متن کاملUnsupervised speaker indexing using anchor models and automatic transcription of discussions
We present unsupervised speaker indexing combined with automatic speech recognition (ASR) for speech archives such as discussions. Our proposed indexing method is based on anchor models, by which we define a feature vector based on the similarity with speakers of a large scale speech database. Several techniques are introduced to improve discriminant ability. ASR is performed using the results ...
متن کاملA study of irrelevant variability normalization based training and unsupervised online adaptation for LVCSR
This paper presents an experimental study of a maximum likelihood (ML) approach to irrelevant variability normalization (IVN) based training and unsupervised online adaptation for large vocabulary continuous speech recognition. A movingwindow based frame labeling method is used for acoustic sniffing. The IVN-based approach achieves a 10% relative word error rate reduction over an ML-trained bas...
متن کاملذخیره در منابع من
با ذخیره ی این منبع در منابع من، دسترسی به آن را برای استفاده های بعدی آسان تر کنید
عنوان ژورنال:
دوره شماره
صفحات -
تاریخ انتشار 2009